Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Unsupervised measures for parameter selection of binarization algorithms

Identifieur interne : 000536 ( Main/Exploration ); précédent : 000535; suivant : 000537

Unsupervised measures for parameter selection of binarization algorithms

Auteurs : Marte A. Ramirez-Ortegon [Allemagne] ; Edgar A. Duenez-Guzman [États-Unis] ; Raúl Rojas [Allemagne] ; Erik Cuevas [Mexique]

Source :

RBID : Pascal:11-0105971

Descripteurs français

English descriptors

Abstract

In this paper, we propose a mechanism for systematic comparison of the efficacy of unsupervised evaluation methods for parameter selection of binarization algorithms in optical character recognition (OCR). We also analyze these measures statistically and ascertain whether a measure is suitable or not to assess a binarization method. The comparison process is streamlined in several steps. Given an unsupervised measure and a binarization algorithm we: (i) find the best parameter combination for the algorithm in terms of the measure, (ii) use the best binarization of an image on an OCR, and (iii) evaluate the accuracy of the characters detected. We also propose a new unsupervised measure and a statistical test to compare measures based on an intuitive triad of possible results: better, worse or comparable performance. The comparison method and statistical tests can be easily generalized for new measures, binarization algorithms and even other accuracy-driven tasks in image processing. Finally, we perform an extensive comparison of several well known measures, binarization algorithms and OCRs, and use it to show the strengths of the WV measure.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Unsupervised measures for parameter selection of binarization algorithms</title>
<author>
<name sortKey="Ramirez Ortegon, Marte A" sort="Ramirez Ortegon, Marte A" uniqKey="Ramirez Ortegon M" first="Marte A." last="Ramirez-Ortegon">Marte A. Ramirez-Ortegon</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Institut für Informatik, Freie Universität Berlin, Takustr. 9</s1>
<s2>14195 Berlin</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="3">Berlin</region>
<settlement type="city">Berlin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Duenez Guzman, Edgar A" sort="Duenez Guzman, Edgar A" uniqKey="Duenez Guzman E" first="Edgar A." last="Duenez-Guzman">Edgar A. Duenez-Guzman</name>
<affiliation wicri:level="4">
<inist:fA14 i1="02">
<s1>Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford St.</s1>
<s2>Cambridge, MA 02138</s2>
<s3>USA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Massachusetts</region>
<settlement type="city">Cambridge (Massachusetts)</settlement>
</placeName>
<orgName type="university">Université Harvard</orgName>
</affiliation>
</author>
<author>
<name sortKey="Rojas, Raul" sort="Rojas, Raul" uniqKey="Rojas R" first="Raúl" last="Rojas">Raúl Rojas</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Institut für Informatik, Freie Universität Berlin, Takustr. 9</s1>
<s2>14195 Berlin</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="3">Berlin</region>
<settlement type="city">Berlin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Cuevas, Erik" sort="Cuevas, Erik" uniqKey="Cuevas E" first="Erik" last="Cuevas">Erik Cuevas</name>
<affiliation wicri:level="1">
<inist:fA14 i1="03">
<s1>Department of Computer Science, University of Guadalajara, Av. Revolución 1500</s1>
<s2>Guadalajara, Jalisco</s2>
<s3>MEX</s3>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Mexique</country>
<wicri:noRegion>Guadalajara, Jalisco</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">11-0105971</idno>
<date when="2011">2011</date>
<idno type="stanalyst">PASCAL 11-0105971 INIST</idno>
<idno type="RBID">Pascal:11-0105971</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000151</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000622</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000090</idno>
<idno type="wicri:doubleKey">0031-3203:2011:Ramirez Ortegon M:unsupervised:measures:for</idno>
<idno type="wicri:Area/Main/Merge">000542</idno>
<idno type="wicri:Area/Main/Curation">000536</idno>
<idno type="wicri:Area/Main/Exploration">000536</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Unsupervised measures for parameter selection of binarization algorithms</title>
<author>
<name sortKey="Ramirez Ortegon, Marte A" sort="Ramirez Ortegon, Marte A" uniqKey="Ramirez Ortegon M" first="Marte A." last="Ramirez-Ortegon">Marte A. Ramirez-Ortegon</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Institut für Informatik, Freie Universität Berlin, Takustr. 9</s1>
<s2>14195 Berlin</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="3">Berlin</region>
<settlement type="city">Berlin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Duenez Guzman, Edgar A" sort="Duenez Guzman, Edgar A" uniqKey="Duenez Guzman E" first="Edgar A." last="Duenez-Guzman">Edgar A. Duenez-Guzman</name>
<affiliation wicri:level="4">
<inist:fA14 i1="02">
<s1>Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford St.</s1>
<s2>Cambridge, MA 02138</s2>
<s3>USA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Massachusetts</region>
<settlement type="city">Cambridge (Massachusetts)</settlement>
</placeName>
<orgName type="university">Université Harvard</orgName>
</affiliation>
</author>
<author>
<name sortKey="Rojas, Raul" sort="Rojas, Raul" uniqKey="Rojas R" first="Raúl" last="Rojas">Raúl Rojas</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Institut für Informatik, Freie Universität Berlin, Takustr. 9</s1>
<s2>14195 Berlin</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="3">Berlin</region>
<settlement type="city">Berlin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Cuevas, Erik" sort="Cuevas, Erik" uniqKey="Cuevas E" first="Erik" last="Cuevas">Erik Cuevas</name>
<affiliation wicri:level="1">
<inist:fA14 i1="03">
<s1>Department of Computer Science, University of Guadalajara, Av. Revolución 1500</s1>
<s2>Guadalajara, Jalisco</s2>
<s3>MEX</s3>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Mexique</country>
<wicri:noRegion>Guadalajara, Jalisco</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Pattern recognition</title>
<title level="j" type="abbreviated">Pattern recogn.</title>
<idno type="ISSN">0031-3203</idno>
<imprint>
<date when="2011">2011</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Pattern recognition</title>
<title level="j" type="abbreviated">Pattern recogn.</title>
<idno type="ISSN">0031-3203</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Accuracy</term>
<term>Algorithm</term>
<term>Image processing</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Performance evaluation</term>
<term>Signal classification</term>
<term>Statistical method</term>
<term>Statistical test</term>
<term>Unsupervised classification</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Classification non supervisée</term>
<term>Algorithme</term>
<term>Reconnaissance optique caractère</term>
<term>Méthode statistique</term>
<term>Précision</term>
<term>Test statistique</term>
<term>Evaluation performance</term>
<term>Traitement image</term>
<term>Classification signal</term>
<term>Reconnaissance forme</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Méthode statistique</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">In this paper, we propose a mechanism for systematic comparison of the efficacy of unsupervised evaluation methods for parameter selection of binarization algorithms in optical character recognition (OCR). We also analyze these measures statistically and ascertain whether a measure is suitable or not to assess a binarization method. The comparison process is streamlined in several steps. Given an unsupervised measure and a binarization algorithm we: (i) find the best parameter combination for the algorithm in terms of the measure, (ii) use the best binarization of an image on an OCR, and (iii) evaluate the accuracy of the characters detected. We also propose a new unsupervised measure and a statistical test to compare measures based on an intuitive triad of possible results: better, worse or comparable performance. The comparison method and statistical tests can be easily generalized for new measures, binarization algorithms and even other accuracy-driven tasks in image processing. Finally, we perform an extensive comparison of several well known measures, binarization algorithms and OCRs, and use it to show the strengths of the WV measure.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Allemagne</li>
<li>Mexique</li>
<li>États-Unis</li>
</country>
<region>
<li>Berlin</li>
<li>Massachusetts</li>
</region>
<settlement>
<li>Berlin</li>
<li>Cambridge (Massachusetts)</li>
</settlement>
<orgName>
<li>Université Harvard</li>
</orgName>
</list>
<tree>
<country name="Allemagne">
<region name="Berlin">
<name sortKey="Ramirez Ortegon, Marte A" sort="Ramirez Ortegon, Marte A" uniqKey="Ramirez Ortegon M" first="Marte A." last="Ramirez-Ortegon">Marte A. Ramirez-Ortegon</name>
</region>
<name sortKey="Rojas, Raul" sort="Rojas, Raul" uniqKey="Rojas R" first="Raúl" last="Rojas">Raúl Rojas</name>
</country>
<country name="États-Unis">
<region name="Massachusetts">
<name sortKey="Duenez Guzman, Edgar A" sort="Duenez Guzman, Edgar A" uniqKey="Duenez Guzman E" first="Edgar A." last="Duenez-Guzman">Edgar A. Duenez-Guzman</name>
</region>
</country>
<country name="Mexique">
<noRegion>
<name sortKey="Cuevas, Erik" sort="Cuevas, Erik" uniqKey="Cuevas E" first="Erik" last="Cuevas">Erik Cuevas</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000536 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000536 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:11-0105971
   |texte=   Unsupervised measures for parameter selection of binarization algorithms
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024